arXiv: 1505.07798V1 [stat.ME] 28 May 2015 


Vol. 00 (0000) 1 
DOI: 0000 


Bayesian Spectral Modeling of 
Microscale Spatial Distributions in a 
Multivariate Soil Matrix 

Maria A. Terres , Montserrat Fuentes , Dean Hesterberg 
and Matthew Polizzotto 

Department of Statistics, North Carolina State University, Raleigh, NC 27695-8203, e-mail: 
materresOncsu.edu, fuentesOncsu.edu 

Soil Science Department, North Carolina State University, Raleigh, NC 27695-7619, e-mail: 
deaii_hesterberg(§ncsu. edu, matt-polizzottoOncsu. edu 

Abstract: Recent technological advances have enabled researchers in a 
variety of fields to collect accurately geocoded data for several variables 
simultaneously. In many cases it may be most appropriate to jointly model 
these multivariate spatial processes without constraints on their conditional 
relationships. When data have been collected on a regular lattice, the multi¬ 
variate conditionally autoregressive (MCAR) models are a common choice. 

However, inference from these MCAR models relies heavily on the pre¬ 
specified neighborhood structure and often assumes a separable covariance 
structure. Here, we present a multivariate spatial model using a spectral 
analysis approach that enables inference on the conditional relationships 
between the variables that does not rely on a pre-specified neighborhood 
structure, is non-separable, and is computationally efficient. Covariance and 
cross-covariance functions are defined in the spectral domain to obtain com¬ 
putational efficiency. Posterior inference on the correlation matrix allows 
for quantification of the conditional dependencies. The approach is illus¬ 
trated for the toxic element arsenic and four other soil elements whose 
relative concentrations were measured on a spatial lattice. Understanding 
conditional relationships between arsenic and other soil elements provides 
insights for mitigating poisoning in southern Asia and elsewhere. 

Keywords and phrases: conditional dependence, lattice, non-separable 
covariance, quasi-matern spectral density, spatial modeling. 


1. Introduction 

Expansive spatial datasets have become more and more common as data have 
become easier to collect with improved technologies. In turn, this has created a 
need for computationally efficient modeling approaches that can accommodate 
these large datasets. Examples of such approaches include the predictive pro¬ 
cess (Banerjee et ah, 2008), nearest-neighbor Gaussian processes (Datta et ah, 
2014), partitioning of the spatial region (Kim et ah, 2005), covariance tapering 
(Sang and Huang, 2012), and others. Each of these approaches exhibits unique 
strengths and weaknesses, as discussed by Stein (2014). In addition, for many 
modern datasets there may be interest in modeling multiple spatial variables 
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jointly, treating them as dependent spatial random variables, in lieu of a regres¬ 
sion approach where several variables are simply being conditioned on. However, 
few of the aforementioned methods can easily accommodate these multivariate 
spatial datasets. Development of computationally efficient approaches to handle 
such data has the potential to greatly improve the extent to which researchers 
can learn about the relationships between spatial processes. 

When data have been collected on a spatial lattice, common in fields such as 
medical imaging and environmental science, some of the most common modeling 
approaches come from the family of conditionally autoregressive (CAR) mod¬ 
els, laid out by Besag (1974) and extended to the multivariate case (MCAR) by 
Mardia (1988). At their heart, CAR models have been defined such that the spa¬ 
tial observation at any location will be normally distributed with a mean that 
is a weighted average of the neighboring observations. Defined through a se¬ 
ries of conditional distributions, CAR models assume conditional independence 
between observations that are not spatially adjacent, and as such are special 
cases of Markov Random Field (MRF) models. In the literature MCAR mod¬ 
els have been criticized for possessing a poorly identihed and overly elaborate 
dependence structure, and multiple reparameterizations have been proposed to 
improve propriety and model behavior (Gelfand and Vounatsou, 2003; Sain and 
Cressie, 2007; Zhang et ah, 2009; Sain et ah, 2011). 

Within the Bayesian literature, the most common approaches to modeling 
multivariate lattice data come from the family of MCAR models. Although 
originally proposed by Mardia (1988), the most prevalent form is the adaptation 
by Gelfand and Vounatsou (2003) ensuring distributional propriety. In this form 
the covariance structure is separable and can be decomposed into a covariance 
matrix describing the (non-spatial) relationship between the variables and a 
second covariance matrix describing the spatial dependence shared across all 
variables. Although this is computationally efficient, it is quite restrictive in its 
description of the marginal spatial dependencies of the variables. Alternative 
formulations, such as Jin et al. (2005) and Jin et al. (2007), allow for non- 
separable formulations but become more computationally expensive. Finally, 
all of these approaches rely on a pre-defined neighborhood structure that limits 
the shape and extent of the spatial dependence in a way that is avoided by 
working with the spectral approach we propose here. 

Although the MCAR models allow for consideration of multivariate spatial 
processes, they only maintain their computational efficiency if one is willing to 
assume a separable covariance structure. In addition, the Markovian structure 
of the model constrains spatial dependence to a set of neighbors pre-determined 
through an adjacency matrix. This is in contrast to geostatistical approaches 
where spatial dependence is assumed to decay as a smooth function of distance 
and the covariance parameters. In situations where there is interest in jointly 
modeling multivariate spatial lattice data while avoiding the separable Marko¬ 
vian structure and propriety issues inherent in MCAR models, spectral analysis 
procedures provide a natural framework to turn to. Computations are conducted 
after transforming the data into the “spectral domain,” allowing for greater ef¬ 
ficiency as discussed in Section 2. The literature on spectral analysis techniques 
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is prolific in time series contexts (Priestley, 1981; Koopmans, 1995), and also 
fairly common in the frequentist spatial literature (Stein, 1999). However with 
a few exceptions, e.g. Handcock and Stein (1993), Reich and Puentes (2012), 
and Stroud et al. (2014), there has been relatively little work in this area when 
modeling spatial data within a Bayesian framework. 

In this paper we present a joint multivariate spatial model using spectral 
analysis procedures to gain computational efficiency. Each spatial variable has 
a spectral density controlling the marginal spatial covariance function, while a 
correlation matrix controls the relative cross-covariances between the variables. 

The model is fit in a Bayesian framework, allowing for uncertainty quantification 
in all aspects of the model. In particular, posterior examination of the correlation 
matrix provides inference on the nature of the conditional dependencies between 
the variables. 

This methodology is illustrated for an analysis of accumulation of poten¬ 
tially toxic arsenic in a soil matrix. Synchrotron X-ray fluorescence microprobe 
(/i-XRF) analysis was used to map accumulated arsenic in relation to other 
chemical elements in thin coatings on a quartz sand grain collected from a soil 
sample. The technique produces multivariate spatial lattice maps that essen¬ 
tially reflect relative abundances of elements. Interest lies in understanding the 
conditional dependencies between arsenic and the other soil components, and 
whether these dependent components serve to mitigate or potentiate the accu¬ 
mulation of arsenic. Arsenic contamination of drinking water is a widespread 
human health concern, particularly in Asian countries where over 100 million 
people routinely consume dangerous levels of arsenic because of their reliance on 
arsenic-contaminated well water for drinking water (Ravenscroft et al., 2009). 

Several methods of water treatment have been studied with varying success, 
including the introduction of additional chemical salts or solutions that will react 
with the arsenic (Jiang et al., 2012; Komarek et al., 2013). However, because soils 
comprise multiple elements in multiple mineral and organic components, a more 
precise understanding of chemical reactions could be aided through a statistical 
description of the soil elements’ dependencies. Unlike the common approach of 
modeling X-ray absorption spectra from soils to identify pure chemical species 
(Manceau et ah, 2014), our approach aims to identify via element associations 
possible interactions between different soil components that produce uniquely 
complex species, or cause the reactivity of any pure soil species to differ from 
that of their model analogues studied in isolation of soil. 

Studying the pairwise behavior of soil elements, a common approach for anal¬ 
ysis of microscale soil chemical data, provides only a very limited view of their 
relationships, and neglects to account for interactions between the elements. 

This is in contrast to our hypothesis that interactions between soil components 
inferred from element pairs may depend on other co-localized element com¬ 
pounds. In order to capture these kinds of behaviors, it is necessary to have a 
model that is sufficiently flexible in its treatment of conditional dependencies. 

The model developed here is illustrated for arsenic and several co-localized 
elements. In natural systems, the mobility and toxicity of arsenic is largely con¬ 
trolled by various competing abiotic and biotic redox and adsorption processes 
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involving organic matter and iron, aluminum, and manganese oxides (Borch 
et al., 2009). Many of these reactions have been studied using single or binary 
mixtures of aqueous species, model minerals and/or organic components. In 
contrast, less mechanistic detail is known about arsenic behavior in soils, which 
comprise diverse assemblages of minerals and organic matter, and conditional 
dependency models would enable scientists to better understand how arsenic 
behaves when multiple components are co-localized within complex soil matri¬ 
ces. 

The lattice structure of the /r-XRF data make spectral procedures a natu¬ 
ral choice for model-fitting. This approach was previously explored by Guinness 
et al. (2014) in a frequentist setup, but the methodology could not accommodate 
more than three spatial variables and lacked adequate measures of uncertainty. 
The need for additional model flexibility and improved uncertainty quantifica¬ 
tion indicated a fresh look at the problem from a Bayesian perspective. Unlike 
the model developed by Guinness et al. (2014), the modeling framework we 
outline can easily accommodate any arbitrary number of spatial variables. We 
illustrate the methodology with five soil elements, including the three that were 
previously analyzed, providing full descriptions of the uncertainty associated 
with our estimates and producing conditional dependence graphs that clearly 
illustrate the relationships between the variables. 

The remainder of this article is organized as follows. Section 2 provides an 
overview of the spectral analysis approach in the spatial domain, outlining some 
properties of the commonly used approximations. In Section 3 the soil mineral 
data are presented as an illustrative example, and the modeling details are 
specified. In Section 4 the model results and potential implications are discussed. 
Finally, Section 5 concludes with a brief discussion of the presented methodology 
and its potential for future work. 

2. Spectral Methods 

2.1. Computing the Likelihood 

Consider a spatial process Z = (Z(si),..., Z(s 7 v))' assumed to be a real¬ 
ization from a Gaussian process with stationary covariance function c(h) = 
cov(Z(s),Z(s -I- h)) that depends on parameters 0. Let Eg denote the corre¬ 
sponding covariance matrix. Then Z ~ A^(0, Eg) and the log-likelihood can be 
computed, 

log(p(Z|0)) = -i(logdetEe + Z%-iZ) (2.1) 

where proportionality constants have been ignored. Normal likelihoods are gen¬ 
erally easy to compute when N is small and are commonly used in spatial 
modeling (Stein, 1999; Cressie and Wikle, 2011; Banerjee et ah, 2014). How¬ 
ever, the matrix inverse E/"^ requires 0{N^) floating point operations (flops), 
rendering computation of this likelihood undesirable when working in large di¬ 
mensions. When the observations are available on a lattice of size N = ni x n 2 , 
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denoted Jjv, then it is convenient to work in the spectral domain since the nor¬ 
mal log-likelihood can be approximated using the Whittle likelihood (Whittle, 
1954; Zimmerman, 1989). This approximation takes advantage of the compu¬ 
tational efficiency of the Fast Fourier Transform (FFT), dramatically reducing 
computation time. This approach is outlined below. 

Bochner’s Theorem states that any stationary covariance function can be 
represented as an inverse Fourier transform, 

c(h) = f exp{iuj'h.)dF{uj) (2.2) 

JR2 

where h is a separation vector for spatial locations s and s-l-h, and uj = (wi, W 2 ) 
is a bivariate spectral frequency. We assume there exists some continuous differ¬ 
entiable f{u)) such that dF{uj) = f{uj)duj, commonly referred to as the spectral 
density. The Spectral Representation Theorem states that the spatial process 
associated with this covariance function can similarly be represented, 

Z{s) = ( exp(iu}'\\)dZ(u3) (2.3) 

where dZ{u}) have uncorrelated increments and E\dZ{u})\’^ = /(uj). 

When the data have been observed on a grid there is inadequate information 
to fully recover the continuous process. In particular, if the data are observed 
at uniformly spaced locations with intervals of length S, i.e. the process can 
be written Z(Sx) for x € Z^, then the associated spectra are constrained to 
frequencies in the finite interval — tt/J < uj < tt/J. This can be seen by observing 
that exp(ia;'h) = exp(i(u; -|- 27rj/5)'h), the so called “aliasing” phenomenon. 

When working with lattice data the aliasing must be accounted for in the 
choice of the associated spectral density. One approach is to choose a spectral 
density defined on the real line, then accumulate the density over all aliased 
frequencies. An alternative is to work with a spectral density that has been 
explicitly defined to have support on a) S 7r/(5]^. We follow the latter ap¬ 

proach in our analyses, selecting the quasi-Matern spectral density introduced 
by Guinness et al. (2014). To ensure a real-valued process, the spectral densi¬ 
ties must additionally be even functions symmetric around zero, which is again 
satisfied by the quasi-Matern spectral density. 

In practice we cannot compute the integrals in (2.2) and (2.3), so we instead 
approximate them with discrete sums. 


c(h) = ^ e*“>/(u;,) 

(2.4) 



Z(s) = ^ e*“^-^(a;,) 

(2.5) 




evaluated at the Fourier frequencies = (/iTtji/ni, Zk j "H’2)■, j = (ji,^ 2 ) € Jjv- 
These approximations have a long history of use in spectral modeling of spatial 
processes, and their properties are well established (Whittle, 1954; Guyon, 1982). 
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Due to the lattice structure in the data, the corresponding covariance matrix 
Eg will be block circulant and the FFT will effectively diagonalize Eg. This en¬ 
ables the log-likelihood to be rewritten in the spectral domain, with computation 
limited by the FFT requiring only 0{N log N) flops, 

log(p(Z|0)) = -^ E ^ogUiu:^)) + E FM{Z,rf{u:,)-^FN{Z,) (2.6) 

\j^lN i^JiV / 

where F]si{7i) is an array denoting the 2-dimensional Fourier transformation 
of the lattice data Z such that the entry Fn^Zj) corresponds to the Fourier 
frequency u}j. This equality can be shown in part by noting that the eigenvalues 
of Eg correspond to the spectral density /(w) evaluated at each of the Fourier 
frequencies. 

The disadvantage of this approximation is that the discreteness of (2.4) and 
(2.5) produces the generally undesirable property of periodicity over the range 
of the data. An adjustment is necessary to mitigate this feature. Some new 
approaches have recently been proposed in the literature involving an imputed 
expansion of the lattice such that the periodicity occurs beyond the domain 
of the observed data (e.g. Guinness and Fuentes, 2014; Stroud et ah, 2014). 
However, we follow the more common approach of data tapering (Dahlhaus, 
1983; Dahlhaus and Kiinsch, 1987). Intuitively, tapering dampens the data along 
the edges towards zero in a smooth way such that the lattice could be folded 
into a torus shape without introducing discontinuities. 

Specifically, we implement a cosine taper, or Tukey taper (Tukey, 1967), 
defined as, 

ri(l-kcos(y(j - §))), 0<j<§ 

Mj) = < 1, i < J < 1 - i (2.7) 

[i(l + cos(^(j-l + i))), l-i<j<l 

for dimensions d = 1, 2 and j = Q/ng, ■ ■ ■, (rid — 1) /rid- The parameter r controls 
the extent of the tapering, typically chosen to be 5-10% of the observations at 
each boundary. The original data Zj = in the likelihood in (2.6) is then 

replaced with the tapered data Zj^^^ x wi{ji) x W 2 (j 2 ), and a multiplicative 

2 na 

adjustment of J)[ ^ Wd(j)'^ is incorporated into the Fig(Zj) terms. 

d=ij=i 

When the data are multivariate, with lattice observations for M spatial vari¬ 
ables Z = (Z(^\ ..., the multivariate process will require a positive def¬ 

inite M X M matrix of spectral densities, f(ti;). The matrix entries along the 
diagonal, fm,m(<^), dictate the marginal covariances for each of the variables. 
Similarly, the matrix entries on the off-diagonal, dictate the cross¬ 

covariances between each pair of variables. The likelihood can then be computed 
similarly to the univariate case, 

log(p(Z|0)) = -^ ( E logdetf(a;,) + E i"iv(Z,)*f(a;,)-iF^(Z,) (2.8) 

\j^lN j 
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where is an array denoting the 2-diniensional Fourier transformation of 

the lattice data for spatial variable such that entry Fn(z/) corresponds to 

the Fourier frequency ujj, the Mxl vector FN{Zj) = {Fn{z/), ..., F]y{z/'^)y 
concatenates the elements of these matrices corresponding to the frequency ijJj 
for each spatial variable, and Fjv(Zj)* is the analogous vector corresponding to 
the complex conjugate transpose of the Fourier transformed lattice data. Data 
tapering can be conducted in the same manner as in the univariate case. 

Similar to the univariate case, computation of the multivariate likelihood 
(2.8) is limited only by the computation of the FFT on each of the spatial 
variables and the inversion of the M x M matrix f(a;). 

3. Multivariate Spatial Sand Grain Model 
3.1. Sand Grain Data 

Data generated for this analysis were obtained by introducing solutions of ar¬ 
senic to a soil sand grain and subsequently assessing the microscale spatial 
distributions of arsenic and a number of native soil elements. The sand grain 
analyzed was separated from a surface soil sample collected from a forest at the 
Central Crops Research Station in Clayton, NC. The spatial distributions of 
elements were mapped using /i-XRF using Beamline X27A at the National Syn¬ 
chrotron Light Source (NSLS), Brookhaven National Laboratory. Our analysis 
focuses on element maps collected after sequential treatments of the grain with 
100 and 1000 fjM arsenic (111) treatments. A 350x450 /im region of interest 
(ROI) was mapped using an X-ray beam of approximately 10x10 ^m^ to yield 
a 35 X 45 pixel array. Resulting elemental maps reflect the relative abundance 
of elements within each lOxlOx ~15 /rm^ voxel analyzed by the incident X-ray 
beam. 

We focus our analysis on the abundant soil matrix element arsenic (As), as 
well as four trace elements: iron (Fe), chromium (Cr), nickel (Ni) and zinc (Zn). 

The elements are modeled on the log-scale and, since our interest lies primarily 
in understanding the dependence structure between these elements, we center 
each spatial process by subtracting the sample mean and assume zero means in 
our modeling. Arsenic and iron were most abundant, averaging 7.23 and 9.47 
on the log-scale, with the other elements averaging between 5.35 and 5.60. To 
specify the model, let Z("®)(s) be the centered log fluorescence signal at location 
s G Jat for soil element m, for m = 1,...,M, with M = 5. The Z*'"‘^(s) are 
spatial quantities each assumed to be a realization from a Gaussian process and 
are modeled jointly in the spectral domain, as described in Section 2. 

The spatial maps of these five soil elements are provided in Figure 1. Note 
that some concentrated hotspots of iron, chromium and nickel (e.g. at coordi¬ 
nates (300,200)) are known contaminants from stainless steel deposited on the 
sample during handling prior to reaction. The correlation between arsenic, iron, 
and to some extent chromium is readily apparent, with similar spatial features 
throughout the region. The correlations with nickel and zinc are less apparent. 
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Through our joint modeling procedure we seek to make inference regarding the 
conditional dependencies exhibited by these five elements. 
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Fig 1. The logged and centered soil components arsenic, iron, chromium, nickel, and zinc. 


3.2. Marginal Spectral Densities 

Recall that the covariance function, c(h), for each soil element is defined in 
the spectral domain using a spectral density, /{us). The spectral densities are 
constrained to be even and to have support on [—tt/J, 7r/(5]^ in order to produce 
real-valued realizations and to avoid the aliasing effect that occurs with lattice 
observations, as described earlier. With these requirements in mind, we focus 
on the non-separable quasi-Matern spectral density defined on a lattice, 

2 

fiu)) = f{uJi,UJ2) = —- - „ ^ --- (3.1) 

^ ‘ 2^ (l + (a/(5)2(sin2((5a;i/2)+sin2(5a;2/2)))-+i ^ ^ 

for a; = {u)i,u} 2 ) S [—tt/5,tt/5Y as described by Guinness et al. (2014). This 
spectral density is attractive in part because it approaches the spectral density 
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of the isotropic Matern covariance function as <5 —>■ 0. The parameter a func¬ 
tions as a range parameter, indicating the distance at which spatial dependence 
becomes negligible. The parameter functions as a variance parameter, con¬ 
trolling the magnitude of the spatial correlation. Due to concerns regarding lack 
of identifiability between the parameters (Zhang, 2004), we fix = 1 in our 
model fitting. We additionally assume that each 10 x 10/xm grid cell represents 
one unit of distance, setting (5=1. 

3.3. Cross-Covariance Structure 

To guarantee a well defined spatial process the M x M matrix f(u;), corre¬ 
sponding to the cross-covariance matrix in the spatial domain, must be positive 
definite. To guarantee this, we follow the approach of Guinness et al. (2014) and 
write 

f(a;) = diag(/^/^(a;))p(a;) diag(/^/^(a;)) (3.2) 

where diag(/^/^(u;)) is a diagonal matrix with diagonal entries /m^(w), the 
square root of the marginal spectral density for the mth element as defined in 
(3.1), for m = 1,..., M. Here, p{<^) is the coherence matrix, a correlation matrix 
with ones on the diagonal and elements Pm,m' (‘*^) describing the correlation 
between components m and m'. This is analogous to decomposing a covariance 
matrix into a correlation matrix and vectors of standard deviations, where here 
the standard deviations are the square roots of the spectral densities instead of 
the square roots of the variances. 

Even in relatively low dimensions, modeling the matrix p{u}) will require the 
estimation of many correlation functions Pm,m' (‘^)- For example, in our example 
with 5 soil elements there would be 5 x (5 — l)/2 = 10 functions that need to 
be estimated in the p{oj) matrix. Estimating this many functions may rapidly 
become prohibitive as the dimension increases. As an alternative, we propose 

f(u;) = diag(/i/2(u;))p diag(/i/2(c^)) (3.3) 

where the matrix p is constant across frequencies. This assumption implies that 
for any pair of soil elements, the spatial dependence described by the cross¬ 
covariance functions will be dictated solely by the pair of marginal covariance 
functions and a multiplicative factor Pm,m' ■ While this assumption is primarily 
motivated by the need for computational efficiency, the resulting model is also 
better identified and still defines a very flexible framework deemed sufficient to 
describe the /r-XRF data. 

Decomposing f(ti;) as in (3.3) additionally simplifies the computation of the 

M 

likelihood. Specifically, it is straight forward to show that det f (w) = (det p) J([ 

m—l 

and to rewrite the inverted matrix f(a;)“^ = diag(/“^/^(a;))p“^diag(/“^/^(a;)). 
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Then, substituting into the multivariate likelihood from (2.8) we have, 

1 ^ 

log(p(Z|6»)) =- -(^iVlogdetp + EE log/m(‘^j) (3.4) 

iGJw m=l 

+ ^ F^(Z,)*diag(/-i/2(a;))p-Miag(/-i/^(a;))F^(Z,)) 

Note, during model fitting the inversion and determinant of p need only be 
computed once at each iteration since it does not depend on the frequency. 

3.4- Conditional Independencies and the Correlation Matrix 

As described by Guinness et al. (2014), conditional independencies between the 
soil components can be explored through examination of f(a;)“^, or equivalently 
through examination of p~^ under our proposed parameterization. The Bayesian 
paradigm allows us to consider uncertainties for each element of the correlation 
matrix enabling examination of the implied conditional independencies, similar 
to the approach followed by Hoff (2007). Specifically, consider a vector of normal 
random variables {zi, z^, ■ ■ ■, Zn)' with associated correlation matrix C. Then 
in the conditional distribution for Zj given the other variables, p{zj\z-j), the 
“coefficients” on Z-j will be _j]- The sign of these “coefficients” and 

whether the credible intervals overlap zero will provide inference on the graph 
of conditional dependence between variables. In this manner, the conditional 
independencies between the soil elements will be explored in our model through 
the correlation matrix p. 

3.5. Prior Specification and Model Fitting 

The following prior distributions are assumed for the parameters, 

am\s^ ~ TA^(0.oo)(0,s^) 

~/G(2,2) 

o-Q ~ /G(zzo/2, voa'^/2) 
a2|j.o~/G(2,2) 
p{vo) oc e~''° 
p{p) oc 1, p e 

where izq G and 77® is the space of all 5 x 5 positive definite correlation 
matrices. Each spatial process Zm has a unique pair of parameters (am, ^m), but 
hyperpriors are placed on these parameters to facilitate sharing of information 
across soil elements. The prior on p follows the suggestion of Barnard et al. 
(2000), who similarly decomposed a covariance matrix into standard deviations 
and a correlation matrix. 
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We make inference on the model parameters through Markov chain Monte 
Carlo (MCMC). The am parameters were updated with a random walk Metropo¬ 
lis Hastings step with the proposal variance tuned during a burn-in period to 
achieve acceptance rates between 0.3 and 0.5. The cr^ parameters are also up¬ 
dated with a Metropolis Hastings step, but in this case a more clever pro¬ 
posal distribution can be used. If the spatial process Zm were being modeled 
marginally, then cr^ would be conjugate, 

E -Pw(Zj™^)*-P'Af(Zj’”^)(l + am(sin^(wij72)+sin^(w2j72)))^ 

_2 la + J jeJiv 

~ — 2 —’- 2 - 

where 6 is the collection of all other parameters. This conjugacy breaks down 
in the multivariate case, but we utilize this marginal conjugate distribution as 
a proposal in the Metropolis Hastings step with good results. The p matrix is 
updated with a griddy Gibbs step (Ritter and Tanner, 1992), as suggested by 
Barnard et al. (2000). The hyperparameters s^,r'o,crg are updated via straight¬ 
forward Gibbs steps. 

4. Analysis Results 

The model described above was fitted to the /rXRF data with a 10% taper yield¬ 
ing covariance and cross-covariance functions for each of the five soil elements 
and the ten soil element pairs. To check for sensitivity, the model was addition¬ 
ally run for a 5% and 15% taper with comparable results (not shown here). The 
covariance functions are provided in Figure 2 and indicate varying marginal vari¬ 
ances and spatial ranges across the soil elements, with iron having the longest 
spatial dependence and zinc having the shortest. The ability to capture this 
variability in spatial dependence is a direct outcome of the flexible spectral 
specification of the model and would be far more difficult to capture in a MRF 
model with a pre-specified neighborhood structure. In addition, the Bayesian 
approach provides straightforward descriptions of the uncertainty surrounding 
these covariance functions, shown here through grey shading representing 95% 
pointwise credible intervals. 

The cross-covariance functions are provided in Figure 3 and similarly illus¬ 
trate the benefits of this spectral Bayesian modeling approach. In particular, the 
credible intervals for the cross-covariances between arsenic/nickel, arsenic/zinc 
and nickel/zinc clearly overlap zero. If one were to plot only the estimated co- 
variance functions, then it would appear that these soil elements are minimally 
positively correlated with one another. However, the 95% credible intervals en¬ 
abled by the Bayesian framework make it quite clear that one cannot infer any 
pairwise relationships between these pairs of elements. 

In addition to learning about the pairwise relationships between these soil el¬ 
ements, researchers are particularly interested in learning multi-element effects 
on the chemical reactivity of an element, i.e., the nature of pairwise relation¬ 
ships when we condition on the presence of other elements in the sample. Using 

imsart-generic ver. 2014/10/16 file: ArXiv_SoilProject_26May2015.tex date: May 29, 2015 




M. A. Terres et al./Bayesian Spectral Modeling of Microscale Spatial Distributions 12 



Fig 2. Estimated covariance functions (95% pointwise intervals in grey) for each of the five 
soil components, plotted for different distances h (pm). 


the approach outlined in Section 3.4, for each soil element we consider the “co¬ 
efficients” that would be placed on the other soil elements in the conditional 
distribution implied by the fitted joint model. Posterior estimates and credible 
intervals for these “coefficients” are provided in Figure 4, with each subplot 
corresponding to one of the five soil elements. For example, looking at the first 
subplot one can see that arsenic has a strong relationship with iron and a slight 
negative relationship with nickel, but does not have a significant relationship 
with chromium and zinc. In conjunction with Figure 3, we can infer that ar¬ 
senic and chromium will be positively correlated when examined pairwise, but 
that this relationship disappears when the other soil elements are accounted for. 

To illustrate these conditional relationships more intuitively, a conditional de¬ 
pendence graph is provided in Figure 5. Each of the soil elements is represented 
by a node, with edges connecting the nodes when there is a positive (indicated 
by a or negative (indicated by a ‘-’) conditional relationship between the 
elements. For example, there is no edge connecting arsenic and chromium, sug¬ 
gesting that they are independent conditional on nickel and iron. That is, the 
observed pairwise correlation between arsenic and chromium can be explained 
statistically via the relationships each have with nickel and iron. 

Finally, as a form of model-checking, we can compare the analysis results for 
arsenic, iron and chromium to those of Guinness et al. (2014). The covariance 
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functions in Figure 2 are consistent with those found in the previous work, as are 
the shapes of the cross-covariance functions in Figure 3. This agreement is en¬ 
couraging and suggests that minimal meaningful flexibility was lost by setting 
p{uj) = p. Additionally, Guinness et al. (2014) explored the dependence be¬ 
tween arsenic and chromium through a hypothesis testing approach and found 
“strong but not overwhelming evidence for conditional dependence.” In con¬ 
trast, our analysis suggests that when nickel and zinc are incorporated into the 
model, the conditional dependence between chromium and arsenic is no longer 
significant. I.e., not surprisingly, the nature of these conditional dependencies 
will change depending on the soil elements being accounted for. This highlights 
the importance of being able to accommodate several variables of interest, pre¬ 
viously impossible using the earlier model. 

Our model can also be checked with respect to the known chemistry of arsenic. 
Scanning electron microscopy - energy-dispersive X-ray (SEM-EDX) analysis 
indicated that visibly dark spots on the sand-grain sample corresponded with 
hotspots (Eigure 1) that are enriched in iron, chromium, and nickel at a ratio 
corresponding with stainless steel (SEM-EDX data not shown). We deduced that 
the contamination was introduced to the sample during mounting with stain¬ 
less steel forceps. Stainless steel adsorbs arsenate, which explains the suppressed 
arsenic accumulation at the locations of iron-chromium-nickel co-enrichment. 

Since the largest relative abundance of nickel was due to the stainless steel, this 
contamination explains the negative conditional relationship observed between 
arsenic and nickel. In contrast, the marginal and conditional relationships be¬ 
tween arsenic and iron were both identified to be positive. This result can be 
explained by noting that the most common form of iron in the sample (Ee(III)) 
was natural and has a high capacity to bind arsenic, and that the competing 
effects of arsenic adsorption by stainless steel is known to be trivial in compar¬ 
ison. 

5. Discussion 

The framework we have outlined enables joint spatial modeling of multiple spa¬ 
tial processes in a computationally efficient manner through the use of Eourier 
transformations and spectral analysis theory. The non-separable covariance struc¬ 
ture allows a unique covariance function for each of the variables, and a unique 
cross-covariance function for each pair of variables. By constructing the model 
in a Bayesian setup, we can fully quantify the uncertainty associated with our 
estimates and make inference on the strength of pairwise variable relationships. 
Post-model fitting examination of the correlation matrix additionally provides 
inference on the conditional relationships between the variables. 

This approach was illustrated for soil elements mapped in thin mineral- 
organic coatings on the surface of a quartz soil-sand grain using /r-XRE analysis, 
including elements that are known to occur at least partially as non-reactive 
stainless-steel contamination. Here we are able to provide statistical inference 
on the pairwise relationships between five soil elements, and can go one step fur¬ 
ther by providing a full description of all conditional relationships. Unlike our 
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new approach, previous work analyzing these data was not able to account for 
more than three soil elements at a time and also lacked adequate descriptions 
of uncertainty (Guinness et ah, 2014). 

Our modeling approach provides a framework for assessing conditioning of 
multiple soil elements that can have different geochemical effects on arsenic ac¬ 
cumulation, and analyses here could reasonably be expanded with additional 
spatial data sets for elements such as aluminum, manganese, and carbon, which 
also are known to impact arsenic reactivity. Ultimately, the approach devel¬ 
oped here could provide a useful tool for more broadly regulating negative im¬ 
pacts of environmental pollutants. Our current understanding of mechanisms of 
chemical-contaminant reactivity is largely gleaned from laboratory experiments 
involving pure, simplified systems. Interactive effects between pollutants, multi¬ 
ple minerals and/or organic matter co-localized in soil microsites are difficult to 
disentangle using current analytical techniques, thereby limiting our predictive 
capabilities in natural environmental settings. An understanding of these in¬ 
teractions from experiments conducted directly with soil materials would allow 
us to formulate hypotheses of chemical reactivity in complex systems and im¬ 
prove mechanistic models that, for instance, help predict the soil and sediment 
controls on arsenic poisoning of Asian well water. 

In its current form the proposed methodology is appropriate for jointly mod¬ 
eling multiple spatial variables observed on a complete lattice. However, future 
adaptations can be envisioned to accommodate such spatial processes that are 
non-stationary (Fuentes, 2002), and/or were collected on an incomplete lattice 
(Fuentes, 2007; Stroud et ah, 2014). 
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Fig 5. Reduced conditional dependence graph for the five soil components. 
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